yarikoptic [Tue, 11 Mar 2025 15:09:20 +0000 (15:09 +0000)]
Added a comment
Joey Hess [Mon, 10 Mar 2025 21:35:34 +0000 (17:35 -0400)]
expand
Joey Hess [Mon, 10 Mar 2025 20:46:55 +0000 (16:46 -0400)]
response
Joey Hess [Mon, 10 Mar 2025 20:42:24 +0000 (16:42 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Mon, 10 Mar 2025 20:41:26 +0000 (16:41 -0400)]
added git-annex-compute-singularity
And implemented SANDBOX, which it needs.
Joey Hess [Mon, 10 Mar 2025 19:14:59 +0000 (15:14 -0400)]
compute protocol debugging
Joey Hess [Mon, 10 Mar 2025 18:15:07 +0000 (14:15 -0400)]
document output files must be regular files
Joey Hess [Mon, 10 Mar 2025 17:47:23 +0000 (13:47 -0400)]
make usage an error
Joey Hess [Mon, 10 Mar 2025 16:52:10 +0000 (12:52 -0400)]
compute: disallow output files that are not regular files
Use case where this came up is a compute program using singularity,
where the process inside the container will be allowed to write to the temp
directory, so could make eg a /etc/shadow symlink, which could then be
used to exfiltrate that from the system to wherever the annex object
might be pushed to.
It seemed better to fix this once in git-annex rather than in any such
compute program.
yarikoptic [Sun, 9 Mar 2025 01:02:55 +0000 (01:02 +0000)]
Added a comment
yarikoptic [Sat, 8 Mar 2025 14:51:20 +0000 (14:51 +0000)]
Added a comment: Any way to annotate what are input files?
Joey Hess [Fri, 7 Mar 2025 21:15:54 +0000 (17:15 -0400)]
symlink, don't hardlink
hardlink can cause problems with unlocked files
Joey Hess [Fri, 7 Mar 2025 21:15:21 +0000 (17:15 -0400)]
disconnect stdio for wasm binaries
Joey Hess [Fri, 7 Mar 2025 20:06:37 +0000 (16:06 -0400)]
use pwd and quote it
Seems more portable and safe
Joey Hess [Fri, 7 Mar 2025 20:03:35 +0000 (16:03 -0400)]
case
Joey Hess [Fri, 7 Mar 2025 20:03:09 +0000 (16:03 -0400)]
layout
Joey Hess [Fri, 7 Mar 2025 20:02:43 +0000 (16:02 -0400)]
layout
Joey Hess [Fri, 7 Mar 2025 20:02:11 +0000 (16:02 -0400)]
add git-annex-compute-wasmedge
Joey Hess [Fri, 7 Mar 2025 20:01:27 +0000 (16:01 -0400)]
redirect command stdout to stderr
Otherwise it will be interpreted as compute program protocol
Joey Hess [Fri, 7 Mar 2025 18:57:12 +0000 (14:57 -0400)]
make OUTPUT subdirs
Simplifies compute programs.
Joey Hess [Fri, 7 Mar 2025 18:50:11 +0000 (14:50 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Fri, 7 Mar 2025 18:47:34 +0000 (14:47 -0400)]
compute: add response to OUTPUT
This allows rejecting output filenames that are outside the repository,
and also handles converting eg "-foo" to "./-foo" to prevent a command
that it's passed to interpreting the output filename as a dashed option.
Joey Hess [Fri, 7 Mar 2025 17:29:57 +0000 (13:29 -0400)]
remove todo I just added
If a compute program does this, it has a security hole. Not git-annex.
Joey Hess [Fri, 7 Mar 2025 17:24:11 +0000 (13:24 -0400)]
todo
jasonb@ab4484d9961a46440958fa1a528e0fc435599057 [Fri, 7 Mar 2025 04:13:24 +0000 (04:13 +0000)]
yarikoptic [Thu, 6 Mar 2025 22:40:35 +0000 (22:40 +0000)]
initial report on slow thaw
Joey Hess [Thu, 6 Mar 2025 18:54:05 +0000 (14:54 -0400)]
improve
Joey Hess [Thu, 6 Mar 2025 18:47:22 +0000 (14:47 -0400)]
add git-annex-compute-imageconvert
Joey Hess [Thu, 6 Mar 2025 18:42:07 +0000 (14:42 -0400)]
prefix output with ./ in example
Joey Hess [Thu, 6 Mar 2025 18:29:07 +0000 (14:29 -0400)]
no longer a draft
Joey Hess [Thu, 6 Mar 2025 18:23:58 +0000 (14:23 -0400)]
Merge branch 'compute'
Joey Hess [Thu, 6 Mar 2025 18:22:45 +0000 (14:22 -0400)]
preparing to merge compute
Joey Hess [Thu, 6 Mar 2025 17:34:51 +0000 (13:34 -0400)]
update
Added a comment: Special use case for Scientific application
Joey Hess [Thu, 6 Mar 2025 16:52:12 +0000 (12:52 -0400)]
update
Joey Hess [Thu, 6 Mar 2025 16:41:30 +0000 (12:41 -0400)]
avoid unncessary git-annex branch changes for recompute and addcomputed
Joey Hess [Wed, 5 Mar 2025 17:46:06 +0000 (13:46 -0400)]
computation progress display
matrss [Wed, 5 Mar 2025 15:40:44 +0000 (15:40 +0000)]
Added a comment
bpoldrack [Wed, 5 Mar 2025 14:23:57 +0000 (14:23 +0000)]
Added a comment
msz [Wed, 5 Mar 2025 13:35:19 +0000 (13:35 +0000)]
Tag copy_file_range todo with projects/INM7 (came from our cluster)
msz [Wed, 5 Mar 2025 13:31:41 +0000 (13:31 +0000)]
Added a comment: DataLad exploration of the compute on demand space
msz [Wed, 5 Mar 2025 11:27:39 +0000 (11:27 +0000)]
Added a comment
kenta [Wed, 5 Mar 2025 00:00:19 +0000 (00:00 +0000)]
filled out bug description
Joey Hess [Tue, 4 Mar 2025 19:50:15 +0000 (15:50 -0400)]
OsPath build fixes
Joey Hess [Tue, 4 Mar 2025 19:46:30 +0000 (15:46 -0400)]
mark unused parameter
While unused, it seems to make sense to keep it, since it explains what
the function is doing.
Joey Hess [Tue, 4 Mar 2025 19:02:02 +0000 (15:02 -0400)]
update todo
Joey Hess [Tue, 4 Mar 2025 18:54:13 +0000 (14:54 -0400)]
safer git sha object filename
Rather than use the filename provided by INPUT, which could come from user
input, and so could be something that looks like a dashed parameter,
use a .git/object/<sha> filename.
This avoids user input passing through INPUT and back out, with the file
path then passed to a command, which could do something unexpected with
a dashed parameter, or other special parameter.
Added a note in the design about being careful of passing user input to
commands. They still have to be careful of that in general, just not in
this case.
Joey Hess [Tue, 4 Mar 2025 18:06:55 +0000 (14:06 -0400)]
cycle detection
Joey Hess [Tue, 4 Mar 2025 17:13:18 +0000 (13:13 -0400)]
improve error message when unable to get an input file
In this case, the compute program is run the same as if addcomputed --fast
were used, so it should succeed, without outputting a computed file.
computeInputsUnavailable is in ComputeState for simplicity, but it is
not serialized with the rest of the ComputeState.
Joey Hess [Tue, 4 Mar 2025 16:51:38 +0000 (12:51 -0400)]
update location log after getting input file from remote
Joey Hess [Tue, 4 Mar 2025 16:43:50 +0000 (12:43 -0400)]
better wording
Avoids this contradiction:
(Auto enabling special remote foo...)
Not enabling compute special remote c2 because [..]
Joey Hess [Tue, 4 Mar 2025 15:06:58 +0000 (11:06 -0400)]
compute remote: get input files from other remotes
This needed some refactoring to avoid cycles, since Remote.Compute
cannot import Remote.List. Instead, it uses Annex.remotes. Which must be
populated by something else, but we know it has been, because something
is using Remote.Compute, which it must have found in the remote list,
which populates that.
In Remote.Compute, keyPossibilities' is called with all loggedLocations,
without the trustExclude DeadTrusted that keyLocations does. There is
another cycle there. This may be a problem if a dead repository is still
a remote.
This is missing cycle prevention, and it's certianly possible to make 2
files in the compute remote co-depend on one-another. Hopefully not in a
real world situation, but it an attacker could certainly do it. Cycle
prevention will need to be added to this.
Joey Hess [Tue, 4 Mar 2025 14:02:33 +0000 (10:02 -0400)]
move showOutput into compute remote
Joey Hess [Mon, 3 Mar 2025 20:07:04 +0000 (16:07 -0400)]
rename config to annex.security.allowed-compute-programs
And require for enable as well as autoenable.
It seemed asking for trouble for `git-annex enable foo` to use whatever
compute program is stored in the git config, without verifying that the
user wants that program to be used.
Note that it would be good to allow `git-annex enable foo program=...`
to be used without the program being in the git config. Not implemented yet
though.
Joey Hess [Mon, 3 Mar 2025 19:47:09 +0000 (15:47 -0400)]
autoenable security for compute special remote
Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.
The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.
Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.
Joey Hess [Mon, 3 Mar 2025 19:12:19 +0000 (15:12 -0400)]
recompute: display one of the changed files
Joey Hess [Mon, 3 Mar 2025 18:56:49 +0000 (14:56 -0400)]
avoid recomputing every time on git inputs
Joey Hess [Mon, 3 Mar 2025 15:59:04 +0000 (11:59 -0400)]
support git files as input to computations
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.
Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.
Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
Joey Hess [Mon, 3 Mar 2025 15:08:36 +0000 (11:08 -0400)]
factor out Annex.GitShaKey
Joey Hess [Mon, 3 Mar 2025 14:57:56 +0000 (10:57 -0400)]
record VURL key hashes in addcomputed and recompute
czard [Mon, 3 Mar 2025 12:08:28 +0000 (12:08 +0000)]
Added a comment: Permission fix
Joey Hess [Thu, 27 Feb 2025 20:19:41 +0000 (16:19 -0400)]
record VURL key hashes when getting from compute remote
Like when getting from the web special remote, when the output of the
computation has changed, record the new hash of the content as an
equivilant key for the VURL key.
Still needs to be done for addcomputed and recompute.
Joey Hess [Thu, 27 Feb 2025 20:18:04 +0000 (16:18 -0400)]
fix build
Joey Hess [Thu, 27 Feb 2025 20:17:42 +0000 (16:17 -0400)]
refactor
Joey Hess [Thu, 27 Feb 2025 19:12:29 +0000 (15:12 -0400)]
many recompute improvements
I've lost track of them all, but it includes:
* Using the same key backend as was used in the original computation.
* Fixing bug that prevented updating the source file key in the compute
state
* Handling --reproducible and --unreproducible.
* recompute --original of a file using VURL, when the result is
different, but the key remains the same, makes the object file
be updated with the new content
* Detecting some other ways the program behavior can change, just for
completeness.
* Also adds --backend to addcomputed.
dmcardle [Thu, 27 Feb 2025 19:02:14 +0000 (19:02 +0000)]
Added a comment
Joey Hess [Thu, 27 Feb 2025 18:54:03 +0000 (14:54 -0400)]
refactoring
Joey Hess [Thu, 27 Feb 2025 15:10:44 +0000 (11:10 -0400)]
fix recompute of renamed files
When a computed file has been renamed, a recompute needs to write to the
new filename.
I decided to remove --others because it's not clear what it should do in
the face of renames. Should it update only other files that have not
been renamed? Or update files that use the old key to the new key
anywhere in the tree? Or write the other files to the cwd, ignoring
renames? Since --others is just a way to save on compute time, adding
this complexity at this point seems like a bad idea. May revisit later.
Added temporary TODO-compute file
Joey Hess [Wed, 26 Feb 2025 19:59:47 +0000 (15:59 -0400)]
todo
Joey Hess [Wed, 26 Feb 2025 19:51:31 +0000 (15:51 -0400)]
recompute closer to working properly
Proper behavior without --others implemented.
And eliminated most of the code duplication through refactoring.
Also, changed it to not stage recomputed files. This way, git diff will
show files that have differences.
Joey Hess [Wed, 26 Feb 2025 18:05:37 +0000 (14:05 -0400)]
refactor
Joey Hess [Wed, 26 Feb 2025 15:25:32 +0000 (11:25 -0400)]
started git-annex recompute
The perform action of this still needs work to do the right thing.
In particular, it currently behaves as if --others was always set.
And, it duplicates a lot of code from addcomputed.
Joey Hess [Wed, 26 Feb 2025 13:47:56 +0000 (09:47 -0400)]
showOutput
when the compute program eg displays usage, it needs to start on its own
line
Joey Hess [Wed, 26 Feb 2025 13:45:35 +0000 (09:45 -0400)]
addcomputed inherits extra initremote parameters
This is limited because the remote config is a field/value map. So order
is not preserved, and when 2 parameters have the same field name, only
the last one will be passed.
Joey Hess [Tue, 25 Feb 2025 22:45:55 +0000 (18:45 -0400)]
todo
Joey Hess [Tue, 25 Feb 2025 22:44:40 +0000 (18:44 -0400)]
add compute remote uuid to compute state url
Otherwise, two different compute remotes that happen to take the same
input would use the same compute state url. Which seems wrong.
Joey Hess [Tue, 25 Feb 2025 21:26:28 +0000 (17:26 -0400)]
wording
Joey Hess [Tue, 25 Feb 2025 21:23:38 +0000 (17:23 -0400)]
pdate demo program
needed a mkdir
Joey Hess [Tue, 25 Feb 2025 21:10:41 +0000 (17:10 -0400)]
use compute program REPRODUCIBLE by default
Joey Hess [Tue, 25 Feb 2025 21:00:00 +0000 (17:00 -0400)]
ingest when --unreproducible is used without --fast
Joey Hess [Tue, 25 Feb 2025 20:36:22 +0000 (16:36 -0400)]
addcomputed --fast and --unreproducible working
For these, use VURL and URL keys, with an "annex-compute:" URI prefix.
These URL keys will look something like this:
URL--annex-compute&cbar4,63pconvert,3-
f4d3d72cf3f16ac9c3e9a8012bde4462
Generally it's too long so most of it gets md5summed. It's a little
ugly, but it's what fell out of the existing URL key generation
machinery. I did consider special casing to eg
"URL--annex-compute&
c4d3d72cf3f16ac9c3e9a8012bde4462". But it seems at
least possibly useful that the name of the file that was computed is
visible and perhaps one or two words of the git-annex compute command
parameters.
Note that two different output files from the same computation will get
the same URL key. And these keys should remain stable.
wolf480@8ad1ccdd08efc303a88f7e88c4e629be6637a44e [Tue, 25 Feb 2025 19:58:35 +0000 (19:58 +0000)]
Joey Hess [Tue, 25 Feb 2025 19:45:14 +0000 (15:45 -0400)]
add git-annex addcomputed
Working pretty well. Mostly. But:
* Does not yet support inputs that are non-annexed files checked into git
* --fast is currently broken (will need something like VURL keys)
* --unreproducible still uses a checksumming backend, so drop and get
again will likely fail (needs probably to use an URL key or something
like one)
The compute special remote seems to work pretty well too. Eg,
getting from it works, and dropping content that is present in it works.
wolf480@8ad1ccdd08efc303a88f7e88c4e629be6637a44e [Tue, 25 Feb 2025 19:43:44 +0000 (19:43 +0000)]
create bug report: creating can't pass spaces in youtube-dl-options
Joey Hess [Tue, 25 Feb 2025 19:08:38 +0000 (15:08 -0400)]
handle comutations in subdirs of the git repository
Eg, a computation might be run in "foo/" and refer to "../bar" as an
input or output.
So, the subdir is part of the computation state.
Also, prevent input or output of files that are outside the git
repository. Of course, the program can access any file on disk if it
wants to; this is just a guard against mistakes. And it may also be
useful if the program comunicates with something less trusted than it,
eg a container image, so input/output files communicated by that are not
the source of security problems.
Joey Hess [Mon, 24 Feb 2025 20:39:55 +0000 (16:39 -0400)]
add field desc
Joey Hess [Mon, 24 Feb 2025 20:15:46 +0000 (16:15 -0400)]
updated interface
Joey Hess [Mon, 24 Feb 2025 20:15:04 +0000 (16:15 -0400)]
update for new interface
Joey Hess [Mon, 24 Feb 2025 19:48:42 +0000 (15:48 -0400)]
reimplement using new compute program interface
Joey Hess [Mon, 24 Feb 2025 17:48:46 +0000 (13:48 -0400)]
support addcomputed --fast
This complicates the interface but it's still simpler to understand than
the old interface.
Joey Hess [Mon, 24 Feb 2025 16:41:25 +0000 (12:41 -0400)]
new compute program interface
This is much more flexible, and also simpler to understand.
Basile.Pinsard [Mon, 24 Feb 2025 16:36:56 +0000 (16:36 +0000)]
jnkl [Sun, 23 Feb 2025 20:56:00 +0000 (20:56 +0000)]
jnkl [Sun, 23 Feb 2025 20:55:22 +0000 (20:55 +0000)]
jnkl [Sun, 23 Feb 2025 20:54:56 +0000 (20:54 +0000)]
jnkl [Sun, 23 Feb 2025 20:48:35 +0000 (20:48 +0000)]
jnkl [Sun, 23 Feb 2025 20:25:08 +0000 (20:25 +0000)]
Joey Hess [Sat, 22 Feb 2025 14:04:58 +0000 (10:04 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Sat, 22 Feb 2025 14:04:28 +0000 (10:04 -0400)]
distribits 2025
Atemu [Sat, 22 Feb 2025 10:51:45 +0000 (10:51 +0000)]